Spotting Local Environments in Self-Assembled Monolayer-Protected Gold Nanoparticles

Organic–inorganic (O–I) nanomaterials are versatile platforms for an incredible high number of applications, ranging from heterogeneous catalysis to molecular sensing, cell targeting, imaging, and cancer diagnosis and therapy, just to name a few. Much of their potential stems from the unique control of organic environments around inorganic sites within a single O–I nanomaterial, which allows for new properties that were inaccessible using purely organic or inorganic materials. Structural and mechanistic characterization plays a key role in understanding and rationally designing such hybrid nanoconstructs. Here, we introduce a general methodology to identify and classify local (supra)molecular environments in an archetypal class of O–I nanomaterials, i.e., self-assembled monolayer-protected gold nanoparticles (SAM-AuNPs). By using an atomistic machine-learning guided workflow based on the Smooth Overlap of Atomic Positions (SOAP) descriptor, we analyze a collection of chemically different SAM-AuNPs and detect and compare local environments in a way that is agnostic and automated, i.e., with no need of a priori information and minimal user intervention. In addition, the computational results coupled with experimental electron spin resonance measurements prove that is possible to have more than one local environment inside SAMs, being the thickness of the organic shell and solvation primary factors in the determining number and nature of multiple coexisting environments. These indications are extended to complex mixed hydrophilic–hydrophobic SAMs. This work demonstrates that it is possible to spot and compare local molecular environments in SAM-AuNPs exploiting atomistic machine-learning approaches, establishes ground rules to control them, and holds the potential for the rational design of O–I nanomaterials instructed from data.

. Voronoi tessellation based on the center of mass of the ligands projected onto a bi-dimensional (φ, cos (θ)) plane for the shorter alkyl chain (C11/C12) nanoparticles. Each dot corresponds to a ligand center of mass. Each polygon is colored according to its value, where darker (lighter) regions represent smaller (larger) area with higher (lower) ligand density with respect to the average. a) NP1, b) NP1/6, c) NP3, d) NP3/6, e) NP5, f) NP5/6. Figure S4. Voronoi tessellation based on the center of mass of the ligands projected onto a bi-dimensional (φ, cos (θ)) plane for the longer alkyl chain (C16) nanoparticles. Each dot corresponds to a ligand center of mass. Each polygon is colored according to its value, where darker (lighter) regions represent smaller (larger) area with higher (lower) ligand density with respect to the average. a) NP2, b) NP2/6, c) NP4, d) NP4/6.   Figure S5. Normalized water distribution at increasing distance from the gold surface for NP1 and NP2 series. Color code same as in Figure 4. Figure S6. Normalized water distribution at increasing distance from the gold surface for NP3 and NP4 series. Color code same as in Figure 4. Figure S7. Normalized water distribution at increasing distance from the gold surface for NP5 serie. Color code same as in Figure 4.
The obtained nanoparticles were characterized by 1 H-NMR spectroscopy, TEM and UV-vis spectroscopy.
After decomposition of the NPs using a solution of I2 in methanol, a 2.1/1 ratio HS-MDDS/F6 was found from the integrals of the 1 H NMR spectrum. TEM: 4.5 ± 0.9 nm, TGA 15 %, Au2975MDDS239F132.  In addition, our DPD simulations contain also bonds described by a harmonic spring force , where is the stiffness of the spring and 0 is the equilibrium distance between the bonded particles ij, and a harmonic bond angle force = 1 2 sin( − 0 ), where is the spring constant and 0 the equilibrium angle between adiacent bead triples ijz in a row.
All the variables are rescaled in DPD simulations by the particle mass mi, the cutoff distance , and energy unit , where are kB is the Boltzmann factor and T is the absolute temperature.  The number of ligands was assigned to each nanoparticle to match the experimental number of chain for nm 2 .
Each ligand was placed close to the gold surface and oriented outward with the head-tail vector along the radial direction. The nanoparticle was then solvated by Packmol, 10 assuming a standard bead density ρ of 3, and placed in the middle of a 3D periodic simulation box of 32rc x 32rc x 32rc length. All Au beads were forced to move as a rigid body during the simulation time. Each configuration was first relaxed for 1x10 4 steps with a time step of t = 0.01τ. After, at least additional 6x10 6 time steps (t = 0.02τ) were performed for productive runs. Electrostatic interactions were treated by means of the smooth particle mesh Ewald method. System equilibration was assessed by monitoring temperature, pressure, density, and potential energy profile as well as composition of nearest neighbours of the sulphur beads. Each system was tested on three independently generated starting configurations.  Table S3. DPD calculations were performed using the Culgi simulation package (v.12.0, Culgi B.V., Leiden, The Netherlands). The force cutoff radius rc, the particle mass mi, and kBT were taken as units of length, mass and energy.

S3.2. Molecular dynamics (MD)
At atomistic level ligand 1-6 were prepared using antechamber and assigning gaff2 11 atom types; force field parameters for the radical probe 7 were taken from the works of Barone et al. [12][13] Partial charges were calculated applying the RESP method provided by RED server. Au-Au interactions were described with the parameters of INTERFACE 14 force field for metals. A harmonic bond was created between each sulfur atom and a gold atom within 3.3 Å with a spring constant 50.000 kJ/mol*nm 2 . 15 Ligands positions on the Au surface were mapped back from DPD calculations for NP1-5/6 (Section S3.1).
Using tleap 16 program in combination with Packmol, each system was then solvated with TIP3P water, extending at least 20 Å from each solute atom, and counterions were added to neutralize the system. A combination of steepest descent (10000 cycles) and conjugate gradient (10000 cycles), followed by a heating phase of 100 ps in NVT ensemble (integration step = 1 fs), was carried out to reach the production temperature of 300 K (or 340K). Then, density was equilibrated for at least 50 ns in NPT conditions (integration step = 2 fs, pressure 1 atm), while pressure was maintained by Berendsen barostat. Finally, we switched to Monte Carlo barostat implemented in AMBER 18 16  Analysis was mainly performed on production runs via AMBER analysis tools, and by in-house developed Python scripts. Gold size, ligand number and composition were chosen to match those found in the experiments (see Section S2 and ref. [17]) and are summarized in Table S4 for convenience of the reader.

S3.3. Voronoi tessellation and area dispersion index (ADI)
The analysis of distribution of self-assembled ligands was carried out by performing a Voronoi tessellation 18 by using the Python package scipy 19 on 400 frames taken from production runs. The Euclidean coordinates (x, y, z) of each ligand center of mass were first projected onto a bi-dimensional (φ, cos (θ)) plane; in this projected plane, a Voronoi tessellation was calculated and the boundary cells of the system were then excluded to avoid irregularities. To quantify the regularity of the computed Voronoi tessellations, a measure of dispersion denominated area dispersion index (ADI) was calculated according to the relation: where is the mean and 2 is the variance for the selected system computed on the borderless Voronoi polygons areas. This measure was calculated on each system, and to ensure commensurability a step of normalization was carried out, in which each Voronoi polygon area is normalized on the total Voronoi tessellation area of the system. Although the index is rather simple and an indication of uniformity in distributions, it should not be used for systems that show a Poisson distribution of the areas, distribution that wasn't present in any of our systems.

S3.4. Smooth Overlap of Atomic Position (SOAP)
The Smooth Overlap of Atomic Position (SOAP) 20-21 is a state-of-art, general-purpose, atom-centered, densitybased 3D fingerprint that encodes an atomic region or environment coming from an atomistic simulation. So far, the SOAP formalism has been used with considerable success in material informatics for properties prediction and structural classification, among others. [22][23][24] The key concept is the representation of the atomic density around an atom j as a sum of smeared Gaussians centered on each surrounding atom of species α where is a cutoff function going to zero within a cutoff distance , which determines the amplitude of the local environment. The atom density is expanded in terms of a basis of orthogonal radial basis functions ( ) and spherical harmonics By integration over all relative rotations, we obtain a representation of the atomic environment that is not only invariant from translations and permutations but also spherical invariant.
The cutoff radius restricts the contributions to the density to the atoms within rj i < rc. Higher rc, higher is the information encoded in the SOAP vectors, but also higher is the computational cost associated with their evaluation or transformation. A reasonable choice for describing local environments in soft matter 27 is to set the cutoff radius just after the first peak of the Radial Distribution Function (RDF) from the SOAP center of the reporter. For our systems, this led to a cutoff r1= 9.0 Å, and r2= 4.5 Å, (Figure S23).
Due to the large number of chemical species in our systems, the computed SOAP features space included 14354 (considering all atoms) and 4752 (considering only solvent molecules) features for the medium-range and short-range SOAP, respectively. The SOAP analysis was carried out extracting 400 configurations taken from the equilibrated full molecular dynamics trajectory.

S3.5. Gaussian mixtures clustering algorithm
The clustering of the local environments was performed by the Gaussian mixtures model (GMM), implemented in the scikit-learn 28 Python package. GMM is a distribution-based clustering algorithm, which parametrically fits the probability distribution of a set of points x to a sum of N multidimensional where is the fraction of the probability belonging to the Gaussian (i. e., the relative weight of the cluster i-th), is its mean and Σ its covariance The parameters , , and Σ of the model are assigned by the expectation-maximization algorithm (EM), 29 an iterative method for finding maximum likelihood (ℒ) estimates of parameters in statistical models, which for identically independently distributed data takes the form of The automatic identification of the number N of clusters (i.e. local environments) for each system is carried out by minimization of the Bayesian Information Criteria (BIC), 30 which is a complex penalty added to the loglikelihood in the form

= ln( ) − 2ℒ
where k is the number of parameters, including N, estimated by the model and n is the number of data.
Every Gaussian mixture uses a full covariance and the BIC minimization is performed on each system separately. Consistency (or quality) of the clusterization is then assessed by calculating silhouettes scores (SC),

S3.6. Similarity measure
To compare two different atomic local environments (i, j) defined by their respective SOAP vectors, a similarity measure defined from the inner product between the two atom densities was adopted. Integrated over all The distances between environments were then displayed in a distance matrix, where they were sorted by accounting of every other environment distance, thus finally leading to the similarity matrix reported in Figure 7 and S11.

S4. ELECTRON SPIN RESONANCE (ESR)
Monolayer features can be investigated by molecular probes, which are able both to enter inside the monolayer and to possess spectral features that depends on the molecular environment of the surroundings.
Functionalized benzyl tert-bytulnitroxides (BTBN), possess such characteristics and have been largely employed to characterized different type of water-soluble protected gold nanoparticles. [31][32][33][34] In the presence of water-soluble protected gold nanoparticles, when a nitroxide probe is located in the organic compartment of the monolayer, isotropic nitrogen hyperfine splitting constant, aN, is significantly smaller than that measured in water and it is possible to distinguish different EPR signals for the two different environments and thus to measure the partition equilibrium constant of the organic probe which is strictly related to the monolayer composition. The responsiveness of aN to the nature of the environment surrounding the nitroxide probe is a consequence of the resonance hybrid formulation of nitroxidic functionality in which three π electrons are distributed over the nitrogen and oxygen atomic centers. In particular, the greater the polarity of the medium surrounding the radical center, the more important is the dipolar structure (Structure B, Scheme S1) having both a larger spin density on the nitrogen and thus a larger value of the nitrogen hyperfine coupling.

SPECTROSCOPY (XPS)
In Figure S25, we report a typical series of Au 4f, C 1s and S 2p spectra, acquired on the NP2 sample, which is representative of each of the samples we have measured. The Au 4f and C 1s core level were acquired with a photon energy of hv = 410 eV, while the S 2p was acquired with a photon energy of 360 eV. The BE scale was aligned with the main component of the C 1s, which is the tabled C=C bond of the thiols (BE = 284.9). It was possible to identify the Au 4f spin-orbit doublet (spin-orbit splitting E = 3.7 eV), with the Au 4f7/2 component found at BE = 83.9 eV, a value which is compatible with the presence of Au(0). 33 Similarly, the S 2p shows a spin-orbit doublet located at BE=162.2 eV for S 2p3/2 (spin-orbit splitting E = 1.2 eV), consistent with earlier reports. 33 The determination of the organic layer thickness was carried out by extrapolating the intensity of the photoemission signal of C 1s and Au 4f spectra and applying the procedure described by Shard for nanoscopic particles, whose mean core radius is known. 2 Figure S25. Background-subtracted Au4f, S 2p and C 1s core level spectra acquired for the NP2 sample. The black dots are the experimental data, the black continuous line is the best fit to the spectrum obtained using the spectral components (indicated in color).